Added OpenVINO vision model support #33
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support OpenVINO vision model #32
This feature request aims to integrate support for vision language models into our existing framework. Currently, our framework supports non-vision models, but there is a need to extend this support to vision models, which are loaded and processed differently.
To achieve this, I have implemented the following:
Separated Initialization for Vision and Non-Vision Models:
ov_phi3_vision.pyscript provided by OpenVINO.Quantized Vision Model Support:
Phi-3.5-vision-instruct-int4-ov.Implementation Details
Vision Model Initialization:
OpenVinoEngineclass now checks if the model is a vision model during initialization.ov_phi3_vision.pyscript to load and initialize the model.AutoProcessorclass from thetransformerslibrary.Non-Vision Model Initialization:
Streamlined Generation Process:
generate_visionmethod has been updated to log prompt length, new tokens generated, time to first token, prompt tokens per second, and new tokens per second.References
ov_phi3_vision.py